Reinforcement learning for llms - sukrucildirr